Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A revised method to detect erroneous characters wrongly substituted, deleted, and inserted at the end position in Japanese sentences and ‘bunsetsu’s

Identifieur interne : 000517 ( Main/Exploration ); précédent : 000516; suivant : 000518

A revised method to detect erroneous characters wrongly substituted, deleted, and inserted at the end position in Japanese sentences and ‘bunsetsu’s

Auteurs : Chikahiro Araki [Japon] ; Mikio Mori [Japon] ; Shuji Taniguchi [Japon]

Source :

RBID : ISTEX:BAD0D4B84B18B067C1F27DD2A75B94739E05307E

English descriptors

Abstract

A method to detect the erroneous characters wrongly substituted, deleted, and inserted at the interior location of Japanese sentences and ‘bunsetsu’s using mth‐order Markov chain model has been proposed earlier and was found to be useful in detecting these erroneous characters. However, with this method it is difficult to detect erroneous characters at the end position of Japanese sentences and ‘bunsetsu’s, because the Markov chain probabilities of erroneous characters at the end position of sentences and ‘bunsetsu’s, do not remain smaller than the critical value T the same number of times. This paper proposes a method to detect erroneous characters located at the end position of sentences and ‘bunsetsu’s using the ‘skipped Markov chain model’ in addition to the ‘connected Markov chain model’. From experiments with newspaper articles, the proposed method is shown to be useful to correct erroneous characters located at the end position of sentences and ‘bunsetsu’s. © 2011 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.

Url:
DOI: 10.1002/tee.20640


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A revised method to detect erroneous characters wrongly substituted, deleted, and inserted at the end position in Japanese sentences and ‘bunsetsu’s</title>
<author>
<name sortKey="Araki, Chikahiro" sort="Araki, Chikahiro" uniqKey="Araki C" first="Chikahiro" last="Araki">Chikahiro Araki</name>
</author>
<author>
<name sortKey="Mori, Mikio" sort="Mori, Mikio" uniqKey="Mori M" first="Mikio" last="Mori">Mikio Mori</name>
</author>
<author>
<name sortKey="Taniguchi, Shuji" sort="Taniguchi, Shuji" uniqKey="Taniguchi S" first="Shuji" last="Taniguchi">Shuji Taniguchi</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:BAD0D4B84B18B067C1F27DD2A75B94739E05307E</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1002/tee.20640</idno>
<idno type="url">https://api.istex.fr/document/BAD0D4B84B18B067C1F27DD2A75B94739E05307E/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000B97</idno>
<idno type="wicri:Area/Istex/Curation">000B82</idno>
<idno type="wicri:Area/Istex/Checkpoint">000174</idno>
<idno type="wicri:doubleKey">1931-4973:2011:Araki C:a:revised:method</idno>
<idno type="wicri:Area/Main/Merge">000523</idno>
<idno type="wicri:Area/Main/Curation">000517</idno>
<idno type="wicri:Area/Main/Exploration">000517</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">A revised method to detect erroneous characters wrongly substituted, deleted, and inserted at the end position in Japanese sentences and ‘bunsetsu’s</title>
<author>
<name sortKey="Araki, Chikahiro" sort="Araki, Chikahiro" uniqKey="Araki C" first="Chikahiro" last="Araki">Chikahiro Araki</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Department of Human and Artificial Intelligence Systems, Graduate School of Engineering, University of Fukui, 3‐9‐1 Bunkyo, Fukui‐shi 910‐8507</wicri:regionArea>
<wicri:noRegion>Fukui‐shi 910‐8507</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Mori, Mikio" sort="Mori, Mikio" uniqKey="Mori M" first="Mikio" last="Mori">Mikio Mori</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Department of Information and Media Engineering, Graduate School of Engineering, University of Fukui, 3‐9‐1 Bunkyo, Fukui‐shi 910‐8507</wicri:regionArea>
<wicri:noRegion>Fukui‐shi 910‐8507</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Taniguchi, Shuji" sort="Taniguchi, Shuji" uniqKey="Taniguchi S" first="Shuji" last="Taniguchi">Shuji Taniguchi</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Japon</country>
<wicri:regionArea>Department of Information and Media Engineering, Graduate School of Engineering, University of Fukui, 3‐9‐1 Bunkyo, Fukui‐shi 910‐8507</wicri:regionArea>
<wicri:noRegion>Fukui‐shi 910‐8507</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">IEEJ Transactions on Electrical and Electronic Engineering</title>
<title level="j" type="abbrev">IEEJ Trans Elec Electron Eng</title>
<idno type="ISSN">1931-4973</idno>
<idno type="eISSN">1931-4981</idno>
<imprint>
<publisher>Wiley Subscription Services, Inc., A Wiley Company</publisher>
<pubPlace>Hoboken</pubPlace>
<date type="published" when="2011-03">2011-03</date>
<biblScope unit="volume">6</biblScope>
<biblScope unit="issue">2</biblScope>
<biblScope unit="page" from="168">168</biblScope>
<biblScope unit="page" to="172">172</biblScope>
</imprint>
<idno type="ISSN">1931-4973</idno>
</series>
<idno type="istex">BAD0D4B84B18B067C1F27DD2A75B94739E05307E</idno>
<idno type="DOI">10.1002/tee.20640</idno>
<idno type="ArticleID">TEE20640</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">1931-4973</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Markov chain model</term>
<term>deletion error</term>
<term>error detection</term>
<term>insertion error</term>
<term>skipped Markov chain model</term>
<term>substitution error</term>
</keywords>
</textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">A method to detect the erroneous characters wrongly substituted, deleted, and inserted at the interior location of Japanese sentences and ‘bunsetsu’s using mth‐order Markov chain model has been proposed earlier and was found to be useful in detecting these erroneous characters. However, with this method it is difficult to detect erroneous characters at the end position of Japanese sentences and ‘bunsetsu’s, because the Markov chain probabilities of erroneous characters at the end position of sentences and ‘bunsetsu’s, do not remain smaller than the critical value T the same number of times. This paper proposes a method to detect erroneous characters located at the end position of sentences and ‘bunsetsu’s using the ‘skipped Markov chain model’ in addition to the ‘connected Markov chain model’. From experiments with newspaper articles, the proposed method is shown to be useful to correct erroneous characters located at the end position of sentences and ‘bunsetsu’s. © 2011 Institute of Electrical Engineers of Japan. Published by John Wiley & Sons, Inc.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Japon</li>
</country>
</list>
<tree>
<country name="Japon">
<noRegion>
<name sortKey="Araki, Chikahiro" sort="Araki, Chikahiro" uniqKey="Araki C" first="Chikahiro" last="Araki">Chikahiro Araki</name>
</noRegion>
<name sortKey="Mori, Mikio" sort="Mori, Mikio" uniqKey="Mori M" first="Mikio" last="Mori">Mikio Mori</name>
<name sortKey="Taniguchi, Shuji" sort="Taniguchi, Shuji" uniqKey="Taniguchi S" first="Shuji" last="Taniguchi">Shuji Taniguchi</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000517 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000517 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:BAD0D4B84B18B067C1F27DD2A75B94739E05307E
   |texte=   A revised method to detect erroneous characters wrongly substituted, deleted, and inserted at the end position in Japanese sentences and ‘bunsetsu’s
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024